mt 1
MemVLT: Vision-LanguageTrackingwithAdaptive Memory-basedPrompts
As an extension of traditional visual single object tracking (SOT) task [2, 3, 4], VLT can harness the complementary advantages of multiple modalities. Therefore, vision-language trackers (VLTs) have the potential to achieve more promising tracking performance, which has recently attracted widespreadattention[5,6,7,8].
Ordered Memory
Yikang Shen, Shawn Tan, Arian Hosseini, Zhouhan Lin, Alessandro Sordoni, Aaron C. Courville
We also introduce a new Gated Recursive Cell to compose lower levelrepresentations into higher level representation. We demonstrate that our modelachieves strong performance on the logical inference task (Bowman et al., 2015)andtheListOps(NangiaandBowman,2018)task. Wecanalsointerpretthemodelto retrieve the induced tree structure, and find that these induced structures alignwith the ground truth.
- North America > Canada > Quebec (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
Unlocking the Power of Rehearsal in Continual Learning: A Theoretical Perspective
Deng, Junze, Wu, Qinhang, Ju, Peizhong, Lin, Sen, Liang, Yingbin, Shroff, Ness
Rehearsal-based methods have shown superior performance in addressing catastrophic forgetting in continual learning (CL) by storing and training on a subset of past data alongside new data in current task. While such a concurrent rehearsal strategy is widely used, it remains unclear if this approach is always optimal. Inspired by human learning, where sequentially revisiting tasks helps mitigate forgetting, we explore whether sequential rehearsal can offer greater benefits for CL compared to standard concurrent rehearsal. To address this question, we conduct a theoretical analysis of rehearsal-based CL in overparameterized linear models, comparing two strategies: 1) Concurrent Rehearsal, where past and new data are trained together, and 2) Sequential Rehearsal, where new data is trained first, followed by revisiting past data sequentially. By explicitly characterizing forgetting and generalization error, we show that sequential rehearsal performs better when tasks are less similar. These insights further motivate a novel Hybrid Rehearsal method, which trains similar tasks concurrently and revisits dissimilar tasks sequentially. We characterize its forgetting and generalization performance, and our experiments with deep neural networks further confirm that the hybrid approach outperforms standard concurrent rehearsal. This work provides the first comprehensive theoretical analysis of rehearsal-based CL.
- North America > United States (0.46)
- North America > Canada (0.28)
Scaling Law for Stochastic Gradient Descent in Quadratically Parameterized Linear Regression
Ding, Shihong, Zhang, Haihan, Zhao, Hanzhen, Fang, Cong
In machine learning, the scaling law describes how the model performance improves with the model and data size scaling up. From a learning theory perspective, this class of results establishes upper and lower generalization bounds for a specific learning algorithm. Here, the exact algorithm running using a specific model parameterization often offers a crucial implicit regularization effect, leading to good generalization. To characterize the scaling law, previous theoretical studies mainly focus on linear models, whereas, feature learning, a notable process that contributes to the remarkable empirical success of neural networks, is regretfully vacant. This paper studies the scaling law over a linear regression with the model being quadratically parameterized. We consider infinitely dimensional data and slope ground truth, both signals exhibiting certain power-law decay rates. We study convergence rates for Stochastic Gradient Descent and demonstrate the learning rates for variables will automatically adapt to the ground truth. As a result, in the canonical linear regression, we provide explicit separations for generalization curves between SGD with and without feature learning, and the information-theoretical lower bound that is agnostic to parametrization method and the algorithm. Our analysis for decaying ground truth provides a new characterization for the learning dynamic of the model.
Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises: High-Probability Bound, In-Expectation Rate and Initial Distance Adaptation
Recently, several studies consider the stochastic optimization problem but in a heavy-tailed noise regime, i.e., the difference between the stochastic gradient and the true gradient is assumed to have a finite $p$-th moment (say being upper bounded by $\sigma^{p}$ for some $\sigma\geq0$) where $p\in(1,2]$, which not only generalizes the traditional finite variance assumption ($p=2$) but also has been observed in practice for several different tasks. Under this challenging assumption, lots of new progress has been made for either convex or nonconvex problems, however, most of which only consider smooth objectives. In contrast, people have not fully explored and well understood this problem when functions are nonsmooth. This paper aims to fill this crucial gap by providing a comprehensive analysis of stochastic nonsmooth convex optimization with heavy-tailed noises. We revisit a simple clipping-based algorithm, whereas, which is only proved to converge in expectation but under the additional strong convexity assumption. Under appropriate choices of parameters, for both convex and strongly convex functions, we not only establish the first high-probability rates but also give refined in-expectation bounds compared with existing works. Remarkably, all of our results are optimal (or nearly optimal up to logarithmic factors) with respect to the time horizon $T$ even when $T$ is unknown in advance. Additionally, we show how to make the algorithm parameter-free with respect to $\sigma$, in other words, the algorithm can still guarantee convergence without any prior knowledge of $\sigma$. Furthermore, an initial distance adaptive convergence rate is provided if $\sigma$ is assumed to be known.
Non-IID Quantum Federated Learning with One-shot Communication Complexity
Federated learning refers to the task of machine learning based on decentralized data from multiple clients with secured data privacy. Recent studies show that quantum algorithms can be exploited to boost its performance. However, when the clients' data are not independent and identically distributed (IID), the performance of conventional federated algorithms is known to deteriorate. In this work, we explore the non-IID issue in quantum federated learning with both theoretical and numerical analysis. We further prove that a global quantum channel can be exactly decomposed into local channels trained by each client with the help of local density estimators. This observation leads to a general framework for quantum federated learning on non-IID data with one-shot communication complexity. Numerical simulations show that the proposed algorithm outperforms the conventional ones significantly under non-IID settings.
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan (0.04)
- Asia > China > Beijing > Beijing (0.04)
A Set of Recommendations for Assessing Human-Machine Parity in Language Translation
Läubli, Samuel, Castilho, Sheila, Neubig, Graham, Sennrich, Rico, Shen, Qinlan, Toral, Antonio
The quality of machine translation has increased remarkably over the past years, to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations. We reassess Hassan et al.'s 2018 investigation into Chinese to English news translation, showing that the finding of human-machine parity was owed to weaknesses in the evaluation design - which is currently considered best practice in the field. We show that the professional human translations contained significantly fewer errors, and that perceived quality in human evaluation depends on the choice of raters, the availability of linguistic context, and the creation of reference translations. Our results call for revisiting current best practices to assess strong machine translation systems in general and human-machine parity in particular, for which we offer a set of recommendations based on our empirical findings.
- Asia > China > Hong Kong (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Asia > China > Shandong Province > Qingdao (0.04)
- (19 more...)
A Set of Recommendations for Assessing Human–Machine Parity in Language Translation
Läubli, Samuel (University of Zurich) | Castilho, Sheila (Dublin City University) | Neubig, Graham (Carnegie Mellon University) | Sennrich, Rico (University of Edinburgh) | Shen, Qinlan (Carnegie Mellon University) | Toral, Antonio (University of Groningen)
The quality of machine translation has increased remarkably over the past years, to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations. We reassess Hassan et al.'s 2018 investigation into Chinese to English news translation, showing that the finding of human-machine parity was owed to weaknesses in the evaluation design--which is currently considered best practice in the field. We show that the professional human translations contained significantly fewer errors, and that perceived quality in human evaluation depends on the choice of raters, the availability of linguistic context, and the creation of reference translations. Our results call for revisiting current best practices to assess strong machine translation systems in general and human-machine parity in particular, for which we offer a set of recommendations based on our empirical findings.
- Asia > China > Hong Kong (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Asia > China > Shandong Province > Qingdao (0.04)
- (19 more...)
A Variational Time Series Feature Extractor for Action Prediction
Chaveroche, Maxime, Malaisé, Adrien, Colas, Francis, Charpillet, François, Ivaldi, Serena
The problem of recognizing actions or activities has been widely addressed in the computer vision research community: it consists in the classification of a fully or partially observed action, typically observed through cameras or external motion capture [2]. In robotics, recognizing the human activity is paramount for enabling a proper interaction and providing assistance to the human: an assistive device or prosthetics could switch control modes depending on the current human activity (e.g., walking or sitting) [3], [4]; a mobile robot may adapt its navigation depending on the prediction of the human motion [5]. More generally, prediction is important to provide the robot with anticipation capabilities [6]. In collaborative robotics applications in manufacturing, such as in assembly lines, recognizing the current activity of the operator is necessary for ergonomics evaluations [7] and for the optimization of the robot actions. However, there are two critical issues that prevent the direct application of existing techniques in such scenarios. The first issue is the availability of external sensing devices (cameras or motion captures) that poses constraints on the application for many tasks and application scenarios.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > France > Hauts-de-France > Oise > Compiègne (0.04)
- Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)